Midterm Review
| Model | Response | Distribution | Parameter | Linear |
|---|---|---|---|---|
| MLR | continuous | Normal | mean of response | identity |
| Logistic | binary | Bernoulli | probability of S | log-odds |
| Poisson | counts | Poisson | mean count | log |
mean(house price) = \(\beta_0\) + \(\beta_1\)size + \(\beta_2\)basementY
logit(prob. of default) = \(\beta_0\) + \(\beta_1\)balance + \(\beta_2\)income + \(\beta_3\)studentY
log(mean(number of bikes)) = \(\beta_0\) + \(\beta_1\)temperature
These functions are used to interpret the regression coefficients!!!
| Model | continuous covariate |
|---|---|
| MLR | an increase of 1 unit in the covariate is associated with an estimated increase/decrease of \(|\hat{\beta}_1|\) in the mean response |
| Logistic | an increase of 1 unit in the covariate is associated with an estimated increase/decrease in the log odds of success by \(|\hat{\beta}_1|\), or a change in the odds of success by a factor of \(e^{\hat{\beta}_1}\) |
| Poisson | an increase of 1 unit in the covariate is associated with an estimated increase/decrease in the log average counts by \(|\hat{\beta}_1|\), or or a change in the mean counts by a factor of \(e^{\hat{\beta}_1}\) |
Scroll down to see full content
Scroll down to see full content
estimate or estimated (these are not population quantities, they depend on the sample)
associated with (if the data comes from an observational study causation can not be established)
“by a factor of” or “times” or “percent” if estimated coefficients are exponentiated
“holding other variables constant at any value” if the model is additive and has more variables, otherwise don’t!
check units!!
for additive models: coefficients are interpreted holding other variables constant at any value
for models with interactions: interpretations depend on levels or values of other variables
Warning
Interaction terms do not model correlations between covariates!!
We use interactions when the association between a covariate and the response depends on another covariate(s)
or “keeping the spending in TV and newspaper advertising constant at any value”
Table from ISLR
We need the sampling distribution (distribution of the estimators of the regression coefficients, \(\hat{\beta}_j\))
For the 3 models, we usually use a Normal approximation of the sampling distribution (details beyond this course)
glance()Scroll down to see full content
The null hypothesis: \(H_0: \beta_j = 0\)
check the significance level
check the alternative hypothesis
the interpretion depends on the meaning of the coefficient
We have statistically significant evidence (p-value < .001) that the mean number of fires is positively associated with temperature.
Remember that we never know what the true population parameters are! At a significance level:
or
Recall that once the data is collected, the intervals are not random and we interpret them in terms of confidence, not probabilities!
We are 95% confident that each additional degree in temperature is associated with an increase in the mean number of fires between 5% and 10%.
We can use bootstrapping to compute CIs
In worksheet_03 and tutorial_03 we used simulations to study the relevance of assumptions made:
Normality: only for MLR, not needed for estimation or large sample apprximations but the linear model will be a good fit if the assumption holds.
Homoscedasticity: only for MLR, the spread (or variance) of the errors in a model is the same across all levels of the covariate(s).
Confounding Factor: a variable related with a covariate and the response, and it can make it look like there’s an association between them, even if there isn’t.
Multicollinearity: correlation between covariates.
Independence: we assume that observations are independent of each other
Scroll down to see full content
Predictions are values of the response computed with the estimated model for fixed values of the covariates:
Residuals is the difference between the observed response and the fitted values
Fitted values and residuals can be computed for the 3 models: MLR, Logistic and Poisson
However: recall that different quantities can be predicted with logistic (log-odds, odds, probabilities) and Poisson (log-counts, counts)
In Logistic and Poisson, the variance of each observation depends on the covariates (not constant), so residuals are adjusted (e.g., Pearson, deviance)
See tutorial_04 and tutorial_05
© 2026 Gabriela Cohen Freue – Material Licensed under CC By-SA 4.0